Search CORE

43 research outputs found

Formalization of the Czech morphology system with respect to automatic processing of Czech texts

Author: Hlaváčová Jaroslava
Publication venue: Univerzita Karlova, Filozofická fakulta
Publication date: 01/01/2009
Field of study

Přesný morfologický popis slovních tvar· je prvním předpokladem pro úspné automatické zpracování jazykových dat. Systém kategorií a jejich hodnot, které se k popisu pouoívají, jsou náplní první ásti práce. Základním principem je tzv. Zlaté pravidlo morfologie, které říká, oe kaodý slovní tvar by ml být v systému popsán jednoznan. Existence variant na úrovni slovních tvar· i celých paradigmat vak splnní tohoto pravidla komplikuje. Koncept variant roziřujeme na tzv. mutace, mezi které řadíme i jiné mnooiny slovních tvar· se stejným popisem (např. víceré tvary osobn ích zájmen). Mutace dlíme na globální pro popis na úrovni paradigmat a ektivní pro popis jednotlivých slovních tvar·. Toto rozdlení nám umooňuje postihnout jejich asté kombinace. Upoutíme od dlení variant (mutací) podle stylového příznaku jako neobjektivního kritéria. Při d·sledném vyuoívání hodnot kategorií Flektivní mutace a Globální mutace z·stane Zlaté pravidlo morfologie vody splnno. V kapitole o lemmatizaci zavádíme vícenásobné lemma pro popis variantn ích lemmat. Podrobn se zabýváme popisem tzv. slooenin, tedy slovních tvar· typu za, proň, koupilas, koliks. Pro jejich lemmatizaci rovno vyuoíváme konceptu ví- cenásobného lemmatu. Podle slovních druh· jejich slooek je dlíme na nkolik typ·. Zabýváme se téo problémem jejich vyhledávání v...Detailed morphological description of word forms represents one of the most important conditions of a successful automatic processing of linguistic data. The system of categories and their values which are used for the description are the subject of the rst part of the thesis. The basic principle, so-called Golden rule of morphology, states that every word form has to be described by the system unambiguously. The existence of variants of word forms and whole paradigms, however, complicates the accomplishment of this rule.We introduce so called mutations as an extension of the variants to be able to include other sets of word forms with the same description (for instance multiple word forms of Czech personal pronouns). We divide mutations into two parts global ones describing all word forms of a paradigm, and in ectional ones for the description on the word form level. This division enables us to express their various combinations. We do not use features of style for the mutation division, for they are subjective. With a consistent use of the categories called In ectional Mutation and Global Mutation, the Golden rule of morphology will always be valid. The concept of multiple lemma is introduced in a chapter dealing with lemmatization. It describes lemma variants. We give a detailed description of so-called...Institute of the Czech National CorpusÚstav českého národního korpusuFilozofická fakultaFaculty of Art

Asbury Theological Seminary

CU Digital Repository

Formalization of the Czech morphology system with respect to automatic processing of Czech texts

Author: Hlaváčová Jaroslava
Publication venue: Univerzita Karlova, Filozofická fakulta
Publication date: 01/01/2009
Field of study

CU Digital Repository

Machine Translation of Medical Texts in the Khresmoi Project

Author: Dušek Ondřej
Hajič Jan
Hlaváčová Jaroslava
Novák Michal
Pecina Pavel
Rosa Rudolf
Tamchyna Aleš
Urešová Zdeňka
Zeman Daniel
Publication venue
Publication date: 01/01/2014
Field of study

The WMT 2014 Medical Translation Task poses an interesting challenge for Machine Translation (MT). In the standard translation task, the end application is the translation itself. In this task, the MT system is considered a part of a larger system for cross-lingual information retrieval (IR)

Crossref

Biblio at Institute of Formal and Applied Linguistics

Adaptation of machine translation for multilingual information retrieval in the medical domain

Author: Dušek Ondřej
Goeuriot Lorraine
Hajič Jan
Hlaváčová Jaroslava
Jones Gareth J.F.
Kelly Liadh
Leveling Johannes
Mareček David
Novák Michal
Pecina Pavel
Popel Martin
Rosa Rudolf
Tamchyna Aleš
Urešová Zdeňka
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Objective. We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve eectiveness of cross-lingual IR. Methods and Data. Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system. The IR system is based on the BM25 retrieval model implemented in the Lucene search engine. The MT techniques employed in this work include in-domain training and tuning, intelligent training data selection, optimization of phrase table configuration, compound splitting, and exploiting synonyms as translation variants. The IR methods include morphological normalization and using multiple translation variants for query expansion. The experiments are performed and thoroughly evaluated on three language pairs: Czech–English, German–English, and French–English. MT quality is evaluated on data sets created within the Khresmoi project and IR eectiveness is tested on the CLEF eHealth 2013 data sets. Results. The search query translation results achieved in our experiments are outstanding – our systems outperform not only our strong baselines, but also Google Translate and Microsoft Bing Translator in direct comparison carried out on all the language pairs. The baseline BLEU scores increased from 26.59 to 41.45 for Czech–English, from 23.03 to 40.82 for German–English, and from 32.67 to 40.82 for French–English. This is a 55% improvement on average. In terms of the IR performance on this particular test collection, a significant improvement over the baseline is achieved only for French–English. For Czech–English and German–English, the increased MT quality does not lead to better IR results. Conclusions. Most of the MT techniques employed in our experiments improve MT of medical search queries. Especially the intelligent training data selection proves to be very successful for domain adaptation of MT. Certain improvements are also obtained from German compound splitting on the source language side. Translation quality, however, does not appear to correlate with the IR performance – better translation does not necessarily yield better retrieval. We discuss in detail the contribution of the individual techniques and state-of-the-art features and provide future research directions

Crossref

Hal - Université Grenoble Alpes

Irish Universities

DCU Online Research Access Service

Biblio at Institute of Formal and Applied Linguistics

CoNLL 2017 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies

Author: Attia Mohammed
Badmaeva Elena
Banerjee Esha
Burchardt Aljoscha
Cinková Silvie
de Marneffe Marie-Catherine
dePaiva Valeria
Droganova Kira
Elkahky Ali
Fernández Alcalde Héctor
Ginter Filip
Gökırmak Memduh
Habash Nizar
Hajič Jan
Hajič jr., Jan
Harris Kim
Hlaváčová Jaroslava
Kanayama Hiroshi
Kanerva Jenna
Kayadelen Tolga
Kettnerová Václava
Kirchner Jesse
Kwak Sookyoung
Lando Tatiana
Lertpradit Saran
Leung Herman
Li Josie
Luotolahti Juhani
Macketanz Vivien
Mandl Michael
Manning Christopher D.
Manurung Ruli
Marheinecke Katrin
Martínez Alonso Héctor
Mendonça Gustavo
Missilä Anna
Nedoluzhko Anna
Nitisaroj Rattima
Nivre Joakim
Ojala Stina
Petrov Slav
Pitler Emily
Popel Martin
Potthast Martin
Pyysalo Sampo
Reddy Siva
Rehm Georg
Sanguinetti Manuela
Schuster Sebastian
Shimada Atsuko
Simi Maria
Stella Antonio
Straka Milan
Strnadová Jana
Sulubacak Umut
Taji Dima
Tyers Francis
Urešová Zdeňka
Uszkoreit Hans
Yu Zhuoran
Zeman Daniel
Çöltekin Çağrı
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2017
Field of study

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.Peer reviewe

Crossref

Archivio della Ricerca - Università di Pisa

Biblio at Institute of Formal and Applied Linguistics

Helsingin yliopiston digitaalinen arkisto

Institutional Research Information System University of Turin

Khresmoi Professional: Multilingual Semantic Search for Medical Professionals

Author: Aswani Niraj
Beckers Thomas
Birngruber Erich
Boyer Célia
Burner Andreas
Bystroň Jakub
Choukri Khalid
Cruchet Sarah
Cunningham Hamish
Dolamic Ljiljana
Donner René
Dungs Sebastian
Dušek Ondřej
Dědek Jan
Eggel Ivan
Foncubierta Antonio
Fuhr Norbert
Funk Adam
Gaudinat Arnaud
Georgiev Georgi
Gobeill Julien
Goeuriot Lorraine
Gomez Paz
Greenwood Mark
Gschwandtner Manfred
Hajič Jan
Hanbury Allan
Herrera Alba
Hlaváčová Jaroslava
Holzer Markus
Jones Gareth
Jordan Matthias
Jordán Blanca
Kaderk Klemens
Kainberger Franz
Kelly Liadh
Kriewel Sascha
Kritz Marlene
Langs Georg
Lawson Nolan
Leveling Johannes
Mareček David
Markonis Dimitrios
Martínez Iván
Masselot Alexandre
Mazo Hélène
Momtchev Vassil
Müller Henning
Novák Michal
Palotti João
Pecina Pavel
Pentchev Konstantin
Petrak Johann
Peychev Deyan
Pletneva Natalia
Popel Martin
Pottecher Diana
Roberts Angus
Rosa Rudolf
Ruch Patrick
Sachs Alexander
Samwald Matthias
Schneller Priscille
Stefanov Veronika
Tamchyna Aleš
Tinte Miguel
Urešová Zdeňka
Vargas Alejandro
Vishnyakova Dina
Publication venue
Publication date: 01/01/2013
Field of study

There is increasing interest in and need for innovative solutions to medical search. In this paper we present the EU funded Khresmoi medical search and access system, currently in year 3 of 4 of development across 12 partners . The Khresmoi system uses a component based architecture housed in the cloud to allow for the development of several innovative applications to support target users medical information needs. The Khresmoi search systems based on this architecture have been designed to support the multilingual and multimod al information needs of three target groups the general public, general practitioners and consultant radiologists. In this paper we focus on the presentation of the systems to support the latter two groups using semantic, multilingual text and image based (including 2D and 3D radiology images) search

University of Essex Research Repository

Biblio at Institute of Formal and Applied Linguistics

Khresmoi: Multimodal Multilingual Medical Information Search

Author: Aswani Niraj
Beckers Thomas
Birngruber Erich
Boyer Célia
Burner Andreas
Bystroň Jakub
Choukri Khalid
Cruchet Sarah
Cunningham Hamish
Dolamic Ljiljana
Donner René
Dungs Sebastian
Dědek Jan
Eggel Ivan
Foncubierta-Rodríguez Antonio
Fuhr Norbert
Funk Adam
García Seco de Herrera Alba
Gaudinat Arnaud
Georgiev Georgi
Gobeill Julien
Goeuriot Lorraine
Greenwood Mark
Gschwandtner Manfred
Gómez Paz
Hajič Jan
Hanbury Allan
Hlaváčová Jaroslava
Holzer Markus
Jones Gareth
Jordan Blanca
Jordan Matthias
Kaderk Klemens
Kainberger Franz
Kelly Liadh
Kritz Marlene
Langs Georg
Lawson Nolan
Markonis Dimitrios
Martinez Ivan
Masselot Alexandre
Mazo Hélène
Momtchev Vassil
Mriewel Sascha
Müller Henning
Pecina Pavel
Pentchev Konstantin
Peychev Deyan
Pletneva Natalia
Pottecherc Diana
Roberts Angus
Ruch Patrick
Samwald Matthias
Schneller Priscille
Stefanov Veronika
Tinte Miguel A
Ure v sová Zdeňka
Vargas Alejandro
Vishnyakova Dina
Publication venue: 'IOS Press'
Publication date: 01/01/2012
Field of study

Khresmoi is a European Integrated Project developing a multilingual multimodal search and access system for medical and health information and documents. It addresses the challenges of searching through huge amounts of medical data, including general medical information available on the internet, as well as radiology data in hospital archives. It is developing novel semantic search and visual search techniques for the medical domain. At the MIE Village of the Future, Khresmoi proposes to have two interactive demonstrations of the system under development, as well as an overview oral presentation and potentially some poster presentation

University of Essex Research Repository

Biblio at Institute of Formal and Applied Linguistics

Khresmoi – multilingual semantic search of medical text and images

Author: Aswani Niraj
Beckers Thomas
Birngruber Erich
Boyer Célia
Burner Andreas
Bystroň Jakub
Choukri Khalid
Cruchet Sarah
Cunningham Hamish
Dolamic Ljiljana
Donner René
Dungs Sebastian
Dědek Jan
Eggel Ivan
Foncubierta-Rodríguez Antonio
Fuhr Norbert
Funk Adam
Garcia Seco De Herrera Alba
Gaudinat Arnaud
Georgiev Georgi
Gobeill Julien
Goeuriot Lorraine
Greenwood Mark
Gschwandtner Manfred
Gómez Paz
Hajič Jan
Hanbury Allan
Hlaváčová Jaroslava
Holzer Markus
Jones Gareth
Jordan Blanca
Jordan Matthias
Kaderk Klemens
Kainberger Franz
Kelly Liadh
Kriewel Sascha
Kritz Marlene
Langs Georg
Lawson Nolan
Markonis Dimitrios
Martinez Ivan
Masselot Alexandre
Mazo Hélène
Momtchev Vassil
Müller Henning
Palotti Jo ao
Pecina Pavel
Pentchev Konstantin
Peychev Deyan
Pletneva Natalia
Pottecherc Diana
Roberts Angus
Ruch Patrick
Sachs Alexander
Samwald Matthias
Schneller Priscille
Stefanov Veronika
Tinte Miguel A
Ure v sová Zdeňka
Vargas Alejandro
Vishnyakova Dina
Publication venue
Publication date: 01/01/2013
Field of study

The Khresmoi project is developing a multilingual multimodal search and access system for medical and health information and documents. This scientific demonstration presents the current state of the Khresmoi integrated system, which includes components for text and image annotation, semantic search, search by image similarity and machine translation. The flexibility in adapting the system to varying requirements for different types of medical information search is demonstrated through two instantiations of the system, one aimed at medical professionals in general and the second aimed at radiologists. The key innovations of the Khresmoi system are the integration of multiple software components in a flexible scalable medical search system, the use of annotation cycles including manual correction to improve semantic search, and the possibility to do large scale visual similarity search on 2D and 3D (CT, MR) medical images

University of Essex Research Repository

Biblio at Institute of Formal and Applied Linguistics

Relatório de estágio em farmácia comunitária

Author: Abrams Mitchell
Ackermann Elia
Aepli Noëmi
Aghaei Hamid
Agić Željko
Ahmadi Amir
Ahrenberg Lars
Ajede Chika Kennedy
Aleksandravičiūtė Gabrielė
Alfina Ika
Antonsen Lene
Aplonova Katya
Aquino Angelina
Aragon Carolina
Aranzabe Maria Jesus
Arnardóttir Þórunn
Arutie Gashaw
Arwidarasti Jessica Naraiswari
Asahara Masayuki
Ateyah Luma
Atmaca Furkan
Attia Mohammed
Atutxa Aitziber
Augustinus Liesbeth
Badmaeva Elena
Balasubramani Keerthana
Ballesteros Miguel
Banerjee Esha
Bank Sebastian
Barbu Mititelu Verginica
Basmov Victoria
Batchelor Colin
Bauer John
Bedir Seyyit Talha
Bengoetxea Kepa
Berk Gözde
Berzak Yevgeni
Bhat Irshad Ahmad
Bhat Riyaz Ahmad
Biagetti Erica
Bick Eckhard
Bielinskienė Agnė
Bjarnadóttir Kristín
Blokland Rogier
Bobicev Victoria
Boizou Loïc
Borges Völker Emanuel
Bosco Cristina
Bouma Gosse
Bowman Sam
Boyd Adriane
Brokaitė Kristina
Burchardt Aljoscha
Börstell Carl
Candito Marie
Caron Bernard
Caron Gauthier
Cavalcanti Tatiana
Cebiroğlu Eryiğit Gülşen
Cecchini Flavio Massimiliano
Celano Giuseppe G. A.
Cetin Savas
Chalub Fabricio
Chi Ethan
Cho Yongseok
Choi Jinho
Chun Jayeol
Cignarella Alessandra T.
Cinková Silvie
Collomb Aurélie
Connor Miriam
Courtin Marine
Davidson Elizabeth
de Marneffe Marie-Catherine
de Paiva Valeria
de Souza Elvis
Derin Mehmet Oguz
Diaz de Ilarraza Arantza
Dickerson Carly
Dinakaramani Arawinda
Dione Bamba
Dirix Peter
Dobrovoljc Kaja
Dozat Timothy
Droganova Kira
Dwivedi Puneet
Eckhoff Hanne
Eli Marhaba
Elkahky Ali
Ephrem Binyam
Erina Olga
Erjavec Tomaž
Etienne Aline
Evelyn Wograine
Facundes Sidney
Farkas Richárd
Fernanda Marília
Fernandez Alcalde Hector
Foster Jennifer
Freitas Cláudia
Fujita Kazunori
Gajdošová Katarína
Galbraith Daniel
Garcia Marcos
Garza Sebastian
Gerardi Fabrício Ferraz
Gerdes Kim
Ginter Filip
Goenaga Iakes
Gojenola Koldo
Goldberg Yoav
González Saavedra Berta
Griciūtė Bernadeta
Grioni Matias
Grobol Loïc
Grūzītis Normunds
Guillaume Bruno
Guillot-Barbance Céline
Gärdenfors Moa
Gómez Guinovart Xavier
Gökırmak Memduh
Güngör Tunga
Habash Nizar
Hafsteinsson Hinrik
Hajič jr. Jan
Hajič Jan
Han Na-Rae
Hanifmuti Muhammad Yudistira
Hardwick Sam
Harris Kim
Haug Dag
Heinecke Johannes
Hellwig Oliver
Hennig Felix
Hladká Barbora
Hlaváčová Jaroslava
Hociung Florinel
Hohle Petter
Huber Eva
Hwang Jena
Hà Mỹ Linh
Hämäläinen Mika
Ikeda Takumi
Ingason Anton Karl
Ion Radu
Irimia Elena
Ishola Ọlájídé
Jelínek Tomáš
Johannsen Anders
Juutinen Markus
Jónsdóttir Hildur
Jørgensen Fredrik
K Sarveswaran
Kaasen Andre
Kabaeva Nadezhda
Kahane Sylvain
Kanayama Hiroshi
Kanerva Jenna
Katz Boris
Kayadelen Tolga
Kaşıkara Hüner
Kenney Jessica
Kettnerová Václava
Kirchner Jesse
Klementieva Elena
Kopacewicz Kamil
Korkiakangas Timo
Kotsyba Natalia
Kovalevskaitė Jolanta
Krek Simon
Krishnamurthy Parameswari
Kwak Sookyoung
Köhn Arne
Köksal Abdullatif
Laippala Veronika
Lam Lucia
Lambertino Lorenzo
Lando Tatiana
Larasati Septina Dian
Lavrentiev Alexei
Lee John
Lenci Alessandro
Lertpradit Saran
Leung Herman
Levina Maria
Li Cheuk Ying
Li Josie
Li Keying
Li Yuan
Lim KyungTae
Lindén Krister
Ljubešić Nikola
Loginova Olga
Luthfi Andry
Luukko Mikko
Lyashevskaya Olga
Lynn Teresa
Lê Hồng Phương
Macketanz Vivien
Makazhanov Aibek
Mandl Michael
Manning Christopher
Manurung Ruli
Mareček David
Marheinecke Katrin
Martins André
Martínez Alonso Héctor
Matsuda Hiroshi
Matsumoto Yuji
Mašek Jan
McDonald Ryan
McGuinness Sarah
Mendonça Gustavo
Miekka Niko
Mischenkova Karina
Misirpashayeva Margarita
Missilä Anna
Mititelu Cătălin
Mitrofan Maria
Miyao Yusuke
Mojiri Foroushani AmirHossein
Moloodi Amirsaeid
Montemagni Simonetta
More Amir
Moreno Romero Laura
Mori Keiko Sophie
Mori Shinsuke
Morioka Tomohiko
Moro Shigeki
Mortensen Bjartur
Moskalevskyi Bohdan
Muischnek Kadri
Munro Robert
Murawaki Yugo
Müürisep Kaili
Mărănduc Cătălina
Nainwani Pinkey
Nakhlé Mariam
Navarro Horñiacek Juan Ignacio
Nedoluzhko Anna
Nešpore-Bērzkalne Gunta
Nguyễn Thị Minh Huyền
Nguyễn Thị Lương
Nikaido Yoshihiro
Nikolaev Vitaly
Nitisaroj Rattima
Nivre Joakim
Nourian Alireza
Nurmi Hanna
Ojala Stina
Ojha Atul Kr.
Olúòkun Adédayọ̀
Omura Mai
Onwuegbuzia Emeka
Osenova Petya
Partanen Niko
Pascual Elena
Passarotti Marco
Patejuk Agnieszka
Paulino-Passos Guilherme
Peljak-Łapińska Angelika
Peng Siyao
Perez Cenel-Augusto
Perkova Natalia
Perrier Guy
Petrov Slav
Petrova Daria
Phelan Jason
Piitulainen Jussi
Pirinen Tommi A
Pitler Emily
Plank Barbara
Poibeau Thierry
Ponomareva Larisa
Popel Martin
Pretkalniņa Lauma
Prokopidis Prokopis
Przepiórkowski Adam
Prévost Sophie
Puolakainen Tiina
Pyysalo Sampo
Qi Peng
Rademaker Alexandre
Rama Taraka
Ramasamy Loganathan
Ramisch Carlos
Rashel Fam
Rasooli Mohammad Sadegh
Ravishankar Vinit
Real Livy
Rebeja Petru
Reddy Siva
Rehm Georg
Riabov Ivan
Rießler Michael
Rimkutė Erika
Rinaldi Larissa
Rituma Laura
Rocha Luisa
Romanenko Mykhailo
Rosa Rudolf
Rovati Davide
Roșca Valentin
Rudina Olga
Rueter Jack
Rääbis Andriela
Rögnvaldsson Eiríkur
Rúnarsson Kristján
Sadde Shoval
Safari Pegah
Sagot Benoît
Sahala Aleksi
Saleh Shadi
Salomoni Alessio
Samardžić Tanja
Samson Stephanie
Sanguinetti Manuela
Saulīte Baiba
Sawanakunanon Yanin
Scannell Kevin
Scarlata Salvatore
Schneider Nathan
Schuster Sebastian
Seddah Djamé
Seeker Wolfgang
Seraji Mojgan
Shen Mo
Shimada Atsuko
Shirasu Hiroyuki
Shohibussirri Muh
Sichinava Dmitry
Sigurðsson Einar Freyr
Silveira Aline
Silveira Natalia
Simi Maria
Simionescu Radu
Simkó Katalin
Simov Kiril
Skachedubova Maria
Smith Aaron
Soares-Bastos Isabela
Spadine Carolyn
Steingrímsson Steinþór
Stella Antonio
Straka Milan
Strickland Emmett
Strnadová Jana
Suhr Alane
Sulestio Yogi Lesmana
Sulubacak Umut
Suzuki Shingo
Szántó Zsolt
Särg Dage
Taji Dima
Takahashi Yuta
Tamburini Fabio
Tan Mary Ann C.
Tanaka Takaaki
Tella Samson
Tellier Isabelle
Thomas Guillaume
Torga Liisi
Toska Marsida
Trosterud Trond
Trukhina Anna
Tsarfaty Reut
Tyers Francis
Türk Utku
Uematsu Sumire
Untilov Roman
Urešová Zdeňka
Uria Larraitz
Uszkoreit Hans
Utka Andrius
Vajjala Sowmya
van Niekerk Daniel
van Noord Gertjan
Varga Viktor
Villemonte de la Clergerie Eric
Vincze Veronika
Wakasa Aya
Wallenberg Joel C.
Wallin Lars
Walsh Abigail
Wang Jing Xian
Washington Jonathan North
Wendt Maximilan
Widmer Paul
Williams Seyi
Wirén Mats
Wittern Christian
Woldemariam Tsegay
Wong Tak-sum
Wróblewska Alina
Yako Mary
Yamashita Kayo
Yamazaki Naoki
Yan Chunxiao
Yasuoka Koichi
Yavrumyan Marat M.
Yu Zhuoran
Zahra Shorouq
Zeldes Amir
Zeman Daniel
Zhu Hanzhi
Zhuravleva Anna
Çetinoğlu Özlem
Çöltekin Çağrı
Östling Robert
Özateş Şaziye Betül
Özgür Arzucan
Öztürk Başaran Balkız
Øvrelid Lilja
Čéplö Slavomír
Šimková Mária
Žabokrtský Zdeněk
Publication venue
Publication date: 01/09/2016
Field of study

Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Formalization of the Czech morphology system with respect to automatic processing of Czech texts

Author: Hlaváčová Jaroslava
Publication venue
Publication date: 01/01/2009
Field of study

Detailed morphological description of word forms represents one of the most important conditions of a successful automatic processing of linguistic data. The system of categories and their values which are used for the description are the subject of the rst part of the thesis. The basic principle, so-called Golden rule of morphology, states that every word form has to be described by the system unambiguously. The existence of variants of word forms and whole paradigms, however, complicates the accomplishment of this rule.We introduce so called mutations as an extension of the variants to be able to include other sets of word forms with the same description (for instance multiple word forms of Czech personal pronouns). We divide mutations into two parts global ones describing all word forms of a paradigm, and in ectional ones for the description on the word form level. This division enables us to express their various combinations. We do not use features of style for the mutation division, for they are subjective. With a consistent use of the categories called In ectional Mutation and Global Mutation, the Golden rule of morphology will always be valid. The concept of multiple lemma is introduced in a chapter dealing with lemmatization. It describes lemma variants. We give a detailed description of so-called..

CU Digital Repository

National Repository of Grey Literature